home *** CD-ROM | disk | FTP | other *** search
-
- > Date: Fri, 08 Jan 93 13:57:32 CST
- > From: Dan Connolly <connolly@pixel.convex.com>
- >
-
- > This question seems to confuse two things: the ISOlat1 entity
- > set, and the ISO Latin 1 character set. The first is mapping
- > of names to glyphs, and the second is a mapping from the numbers
- > 128-255 to glyphs. I think they're in alphabetical order
- > by name, but not in order by the ISO Latin 1 character set.
-
- I think we should specify ISO latin 1 as the base set. I think that
- a lot of people in the nordic countries use it routinely and they
- will go crazy if they have to use overload the crurly brackets again
- as they have to with mail.
-
- Therefore, we should allow those people who have 8-bit capability to
- just stick in 8-bit codes. Admitedly I thought the ISO world kept to
- the codes 21-7E and A1-FE hex for G0 and G1 graphics sets, using the
- others for control sets (C0 and C1). Maybe ISO Lantin 1 has nothing
- to do with ISO 8 bit extensions. Sorry I can't quote ISO numbers.
- But whatever is common usage, let us have an 8 bit set.
-
- (Anybody illuminate us on this? Anybody got the ISO Latin 1
- character set listing by number?)
-
- Now for died in the wool 7-bit hackers, is it fair to requier them to
- remember numbers, or would it be nicer to allow them to put in
- codes using entity names? Some people would I am sure like the
- latter, but it is NOT important because we are aiming for wysiwyg
- editors and so would regard human-readable character names as a
- temporary thing anyway.
-
-
- > Here is the crux of the matter:
- >
-
- > >The communication between it and the text object would have to be
- defined in
-
- > >terms of a particular character set
- >
-
- > And this character set is stated in the SGML declaration at
- > the top of html.dtd.
-
- No - that is something different. In the top of the DTD is specified
- the reference base set for the DTD itself and SGML documents.
- The interface between two software modules is something else and can
- be independent of that.
-
- > If we define HTML in terms of the
- > full ISO Latin 1 character set, then the parser can deal with
- > ö, and pass it to the text object as a data character, just
- > like an 'A' character. For X displays using iso8559 fonts, that's
- > cool.
-
-
- Sorry, is iso8559 = Iso latin 1? (I have no head for numbers >1 :-)
-
- yes it is cool. Use Midas or Viola to look at the Hyper-G stuff and
- it works very nicely.
-
- > But on a PC or a Mac, that means the text object will have to
- > scan all the data it gets and convert the Latin1 encoding to
- > it's own. Yuck.
-
- Yup. Big deal? Not really. Just a set of parallel tables. Peter
- Flynn of the CURIA project is producing a lot of archived gaelic and
- is currently dealing with a requirement for a line-mode browser which
- can switch its characetr set depending on the terminal emulator the
- reader is using.
-
- Problems only occur if there are characters which can't be mapped 1-1
- to the local set, and must be represented by more than one character
- (like uumlaut -> ue, ae dipthong -> ae etc) AND you can edit, in
- which case the original form must be preserved. In this case, passing
- on of the entity is essential. But doing it for every character >127
- would be a pain memorywise. So I would suggest that a configuable
- table define which entities can be crunched down to a single
- character in the local representation and the rest be passed on from
- the SGML parser to the SGML app as external entities.
-
- > >... and perhaps if there is more than one
-
- > >contender the SGML engine could have a compilation option.
- >
-
- > Hmmm... One might argue that as long as we support conversion
- inside
- > the SGML parser for EBCDIC machines, we might as well support PC
- and
- > Mac character sets while we're at it.
-
- Yes.
-
- Tim
-
-